Dual Teaching: A Practical Semi-supervised Wrapper Method
نویسندگان
چکیده
Semi-supervised wrapper methods are concerned with building effective supervised classifiers from partially labeled data. Though previous works have succeeded in some fields, it is still difficult to apply semi-supervised wrapper methods to practice because the assumptions those methods rely on tend to be unrealistic in practice. For practical use, this paper proposes a novel semi-supervised wrapper method, Dual Teaching, whose assumptions are easy to set up. Dual Teaching adopts two external classifiers to estimate the false positives and false negatives of the base learner. Only if the recall of every external classifier is greater than zero and the sum of the precision is greater than one, Dual Teaching will train a base learner from partially labeled data as effectively as the fully-labeled-data-trained classifier. The effectiveness of Dual Teaching is proved in both theory and practice.
منابع مشابه
Populating Ontologies with Data from OCRed Lists
A flexible, accurate, and efficient method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selectio...
متن کاملPopulating Ontologies by Semi-automatically Inducing Information Extraction Wrappers for Lists in OCRed Documents
A flexible, accurate, and efficient method of extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine queryable, linkable, and editable. But, to work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selection of human guidance. We propose a wrapper-induction solution for...
متن کاملForward Semi-supervised Feature Selection
Traditionally, feature selection methods work directly on labeled examples. However, the availability of labeled examples cannot be taken for granted for many real world applications, such as medical diagnosis, forensic science, fraud detection, etc, where labeled examples are hard to find. This practical problem calls the need for “semi-supervised feature selection” to choose the optimal set o...
متن کاملSemi-Supervised Web Wrapper Repair via Recursive Tree Matching
Continuous data extraction pipelines using wrappers have become common and integral parts of businesses dealing with stock, flight, or product information. Extracting data from websites that use HTML templates is difficult because available wrapper methods are not designed to deal with websites that change over time (the inclusion or removal of HTML elements). We are the first to perform large ...
متن کاملPopulating Ontologies with Data from Lists in Family History Books
A flexible, accurate, and cost-effective method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its sel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1611.03981 شماره
صفحات -
تاریخ انتشار 2016